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(54) Methods for motion estimation with adaptive motion accuracy 

(57) Methods for motion estimation with adaptive 
motion accuracy of the present invention include several 
techniques for computing motion vectors of high pixel 
accuracy with a minor increase in computation. One 
technique uses ^t-search strategies in sub-pixel 
space that smartly searches for the best motion vectors. 
An alternate technique estimates high-accurate motion 
vectors using different interpolation filters at different 
stages in order to reduce computational complexity. Yet 
another technique uses rate-distortion criteria that 
adapts according to the different motion accuracies to 
determine both the best motion vectors and the best 
motion accuracies. Still another technique uses a VLC 
table that Is interpreted differently at different coding 
units, according to the associated motion vector accu- 
racy. 



FIG.6 




I D6noBnptt>-bfl6t BiollDnvBclDf of tfi>6iBhiaa V3. | ^ 



116 



CfMcMng ttM nsw motion i 


f&dof candUfitett In 


ttwgrid of 1iB-f)bCBl nwolull 


on ofiBdlus 1 mat is 


oontaredonVS. ObsMvoitali 


omo of Um csmfldBtes fei 


thoyifd hsvo alresdy bson tBs 


9sa mnQ con do OKippoo. . 




lis 



Q. 

UJ 



Printed by Xenix (UK) Business Servioes 
2.1&7 (HRS)/3.6 



EP 1 073 276 A2 



Description 

BACKGROUND OF THE INVENTION 

5 [0001] The present invention relates generally to a method of compressing or exuding digital video with bits and, 
specifically, to an effective method for estimating and encoding motion vectors tn motion-compensated video coding. 
[0002] In classical motion estimation the current frame to be encoded is decomposed Into image blocks of the same 
size, typically blocks of 16x16 pixels, called "macroblocks." For each current macroblock, the encoder searches for the 
block in a previously encoded frame (the "reference frame") that best matches the current macroblock. The coordinate 

10 shift between a current macroblock and its best match in the reference frame is represented by a two-dimensional vec- 
tor (the "motion vector*) of the macroblock. Each component of the motion vector is measured in pixel units. 
[0003] For example, if the best match for a cunrent macrobk>ck happens to be at the same location, as is the typical 
case in stationary background, the motion vector for the current macroblock is (0,0). If the best match is found two pixels 
to the right and three pixels up from the coordinates of the current macroblock, the motion vector is (2,3). Such motion 

75 vectors are said to have integer pixel (or "integer-pel" or "full-pel") accuracy, since their horizontal X and vertfcal Y com- 
ponents are integer pixel values, in FIG. 1 , the vector =(1,1) represents the full-pel motion vector for a given current 
macroblock. 

[0004] Moving objects in a video scene do not move in integer pixel increments from frame to frame. True motion 
can take any real value along the X and Y directions. Consequently, a better match for a current macroblock can often 
20 be found by interpolating the previous frame by a factor NxN and then searching for the best match in the interpolated 
frame. The motion vectors can then take values in increments of 1/N pixel along X and Y and are saki to have 1/N pixel 
(or "l/N-pel") accuracy. 

[0005] In "Response to Gall for Proposals for H.26L,' ITU-Telecommunteations Standardization Sector, Q.15/SG1 6, 
doc. Q15-F-1 1 , Seoul, Nov. 98, and "Enhancement of the Telenor proposal for H.26L," ITU-Telecommunications Stand- 

25 ardizatlon Sector, Q.15/SG16, doc. Q15-G-25. Monterey. Feb. 99. Gisle Bjontegaard proposed using 1/3-pel accurate 
motion vectors and cubki-like interpolation for the H26L video coding standard (the "Telenor encode^. To do this, the 
Telenor encoder interpolates or "up-samples" the reference frame by 3x3 using a cubic-like interpolation filter. This 
interpolated version requires nine times more memory than the reference frame. At a given macroblock. the Telenor 
encoder estimates the best motion vector in two steps: the encoder first searches for the best integer-pel vector and 

30 then the Telenor encoder searches for the best 1/3-pixel accurate vector V^ys near V^. Using FIG. 1 as an example, a 
total of eight blocks (of 16x16 pixels) in the 3x3 interpolated reference frame are checked to find the best match whfch, 
as shown is the block associated to the motion vector V = (VX, VY) = (1+1/3,1) . The Telenor encoder has several 
problems. First, it uses a sub-optimal fast-search strategy and a complex cubk: filter (at ail stages) to compute the 1/3- 
pel accurate motion vectors. As a result, the computed motion vectora are not optimal and the memory and computation 

35 requirements are very expensive. Further, the Telenor encoder uses an accuracy of the effective rate-distortion criteria 
that is fixed at 1/3-pixel and, therefore, does not adapt to select better motion accuracies. Similarly, the Telenor encoder 
variable-length code ("VLC") table has an accuracy fixed at 1/3-pixel and, therefore, is not adapted and interpreted dif- 
ferently for different accuracies. 

[0006] Most known video compression methods estimate and encode motion vectors with 1/2-pixel accuracy. 

40 because early studies suggested that higher or adaptive motion accuracies would increase computational complexity 
without provkJing additional compression gains. These early studies, however, did not estimate the motion vectors using 
optimized rate-distortion criteria, did not exploit the convexity properties of such criteria to reduce computational com- 
plexity, and did not use effective strategies to encode the motion vectors and their accuracies. 
[0007] One such early study was Bemd Girod's "Motion-Compensating Prediction with Fractional-Pel Accuracy," 

45 IEEE Transactions on Communteations, Vol. 41 i No, 4, pp. 604-612, April 1993 (the "Girod work"). The Girod work is 
the first fundamental analysis on.the benefits of using sub-pixel motion accuracy for video coding. Girod used a simple, 
hierarchical strategy to search for the best motion vector in sub-pixel space. He also used simple mean absolute differ- 
ence ("MAD") criteria to select the best motion.vector for a given accuracy. The best accuracy was selected using a for- 
mula that is not useful in practice since it is based on idealized assumptions, is very complex, and restricts all motion 

so vectors to have the same accuracy within a frame. Rnally, Girod focused only on prediction error energy and did not 
address how to use bits to encode the motion vectors. 

[0008] Another earty study was Smita Gupta's and Allen Gersho*s "On Fractional Pixel Motion Estimation," Proc. 
SPIE VOIP, Vol. 2094, pp. 408-419, Cambridge, November 1993 (the "Gupta woric"). The Gupta work presented a 
method for computing, selecting, and encoding motton vectors with sub-pixel accuracy for video compression. The 
55 Gupta work disclosed a formula based on mean squared error ("MSE") and bilinear interpolation, used this formula to 
find an ideal motion vector, and then quantized stich vector to the desired motion accuracy. The best motion vector for 
a given accuracy was found using the sub-optimal MSE criteria and the best accuracy was selected using the largest 
decrease in difference energy per dtstortton bit, which is a greedy (sub-optimal) criteria. A given motion vector was 
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coded by first encoding that vector with 1/2-pel accuracy and then encoding the higher accuracy with refinement bits. 
Course-to-fine coding tends to require significant bit overhead. 

[0009] In "On the Optimal IWotion Vector Accuracy for Blocl<-Based ly^otion-Compensated Video Coders/ Proc. 
tST/SPIE Digital Video Compression: Algorithms and Technologies, pp. 302>314. San Jose, February 1996 (the *Ribas 

5 work"). Jordi Ribas-Corbera and David L. Neuhoff, modeled the effect of motion accuracy on bit rate and proposed sev- 
eral methods to estimate the optimal accuracies that minimize bit rate. The Ribas work set forth a full-search approach 
for computing motion vectors for a given accuracy and considered only bilinear interpolation. The best motion vector 
was found by minimizing MSE and the best accuracy was selected using some formulas derived from a rate-distortion 
optimization. The motion vectors and accuracies were encoded with frame-adaptive entropy coders, which are complex 

10 to implement in real-time applications. 

[0010] In "Proposal for a new core experiment on prediction enhancement at higher bitrates," ISO/IEC 
JTC1/SC29/WG1 1 Coding of Moving Pictures and Audio, MPEG 97/1827, Sevilla, Feb. 1997 and "Performance Evalu- 
ation of a Reduced Complexity Implementation for Quarter Pel Motion Compensation," ISO/IEC JTC1/SC29/WG11 
Coding of Moving Pictures and Audio, MPEG 97/3146, San Jose, Jan. 1998, Ulrich Benzier proposed using 1/4-pel 

IS accurate motion vectors for the video sequence and more advanced interpolation filters for the MPEG4 video coding 
standard. Benzier, however, used the Girod's fast-search technique to find the 1/4-pel motion vectors. Benzier did con- 
sider different interpolation filters, but proposed a complex filter at the first stage and a simpler filter at the second stage 
and interpolated one macroblock at a time. This approach does not require much cache memory, but it is computation- 
ally expensive because of its complexity arid because all motion vectors are computed with 1/4-pel accuracy for all the 

20 possible modes in a macroblock (e.g., 16x16, four-8x8, sixteen-4x4, etc.) and then the best mode is determined. Ben- 
zier used the MAD criteria to find the best motion vector which was tixed to 1/4-pel accuracy for the whole sequence, 
and hence he did not address how to select the best motion accuracy. Finally, Benzier encoded the motion vectors with 
a variable-length code ("VLC") table that could be used for encoding 1/2 and 1/4 pixel accurate vectors. 
[001 1] The references discussed above do not estimate the motion vectors using optimized rate-distortion criteria 

25 and do not exploit the convexity properties of such criteria to reduce computational complexity. Further, these refer- 
ences do not use effective strategies to encode motion vectors and their accuracies. 

BRIEF SUMMARY OF THE INVENTION 

30 [0012] One preferred embodiment of the present invention addresses the problems of the prk>r art by computing 
motion vectors of high pixel accuracy (also denoted as "fractional" or "sub-pixer accuracy) with a minor increase in 
computation. 

[0013] Experiments have demonstrated that, by using the search strategy of the present invention, a video encoder 
can achieve significant compression gains (e.g., up to thirty percent in bit rate savings over the classical choices of 
35 motk>n accuracy) using similar levels of computation. Since the motion accuracies are adaptrvely computed and 
selected, the present Invention may be described as adaptive motion accuracy ("AMA"). 

[0014] One preferred embodiment of the present invention uses fast-search strategies In sub-pixel space that 
smartly searches for the best motion vectors. This technique estimates motion vectors in motion-compensated video 
coding by finding a best motion vector for a macroblock. The first step is searching a first set of motion vector candidates 

4o in a grid of sub-pixel resolution of a predetermined square radius centered on V^ to find a best motion vector V^. Next, 
a second set of motion vector candidates In a grid of sub-pbcel resolution of a predetermined square radius centered on 
V2 is searched to find a best motion vector V3. Then, a third set of motion vector candidates in a grid of sub-pixel reso- 
lution of a predetermined square radius centered on V3 is searched to find the best motion vector of the macroblock. 
[0015] In an alternate prefenred embodiment the present invention a technique for estimating high-accurate motion 

45 vectors may use different interpolation filters at different stages in order to reduce computational complexity. 

[0016] Another alternate preferred emt}odiment of the present invention selects the best vectors and accuracies in 
a rate-distortion ("RD") sense. This embodiment uses rate-distortion criteria that adapts according to the different 
motion accuracies to determine both the best motion vectors and the best motion accuracies. 
[001 7] Still further, another alternate preferred embodiment of the present invention encodes the motion vector and 

so accuracies with an effective VLC approach. This technique uses a VLC table that is Interpreted differently at different 
coding units, according to the associated motion vector accuracy. 

[0018] The foregoing and other objectives, features, and advantages of the invention will be more readily under- 
stood upon consideration of the following detailed description of the invention, taken in conjunctk>n with the accompa- 
nying drawings. 

55 
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BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING 
[0019] 

5 FIG. 1 is a diagram of an exemplary full-pel and 1/3'pel locations in velocity space. 

FIG. 2 is a flowchart illustrating a prior art method for estimating the best motion vector. 

FIG. 3 is a diagram of an exemplary location of motion vector candidates for full-search in sub-pixel velocity space. 
FIG. 4 is a flowchart lltustrating a full-search preferred embodiment of the method for estimating the best motion 
vector of the present invention. 

10 FIG. 5 is a diagram of an exemplary location of motion vector candidates for fiast-seatch in sub-pixel velocity space. 
FIG. 6 is a flowchart illustrating a fast-search preferred embodiment of the method for estimating the best motion 
vector of the present invention. 

FIG. 7 is a detail flowchart illustrating an alternate preferred embodiment of step 114 of FIG. 6. 

FIG. 8 is a graphical representation of experimental performance results of the Telenor encoder with and without 
15 AMA In the "Container" video sequence, with QCIF resolution, and at the frame rate of 1 0 frames per second. 

FIG. 9 is a graphical representation of experimental performance results of the Telenor encoder with and without 

AMA in the "News* video sequence, with QCIF resolution, and at the frame rate of 10 frames per second. 

FIG. 10 is a graphical representation of experimental performance results of the Telenor encoder with and without 

AMA In the "Mobile" video sequence, with QCIF resolution, and at tfie frame rate of 10 frames per second. 
20 FIG. 1 1 is a graphical representation of experimental performance results of the Telenor encoder with and without 

AMA in the "Garden" video sequence, with SIF resolution, and at the frame rate of 15 frames per second. 

FIG. 12 is a graphical representation of experimental performance results of the Telenor encoder with and without 

AMA in the "Garden" video sequence, with QCIF resolution, and at the frame rate of 15 frames per second. 

FIG. 13 is a graphical representation of experimental performarK:e results of the Telenor encoder with and without 
25 AMA in the "Tempete" video sequence, with SIF resolution, and at the frame rate of 1 5 frames per second 

FIG. 14 is a graphical representation of experimental performance results of the Telenor encoder with and without 

AMA in the "Tempete" video sequence, with QCIF resolution^ and at the frame rate of 15 frames per second. 

FIG. 15 is a graphical representation of experimental performance results of the Telenor encoder with and without 

AMA in the "Paris shaked" video sequence, with QCIF resolution, and at the frame rate of 1 0 frames per second. 
30 FIG. 16 is a graphical representation of experimental performance results of fast-search (Telenor FSAMA-m;") and 

full-search ("Telenor AMA+c") strategies in the "Mobile" video sequence, with QCIF resolution, arid at the frame 

rate of 1 0 frames per second. 

FIG. 17 is a graphical representation of experimental performance results of fast-search ("Telenor FSAMA+c") and 
full-search ("Telenor AMA+c") strategies in the "Container" video sequence, with QCIF resolution, and at the frame 
35 rate of 1 0 frames per second. 

FIG. 1 8 is a graphical representation of experimental performance results of tests using only one reference frame 
for motion compensation as compared to tests using multiple reference frames for motion compensation the in the 
"Mobile" video sequence, with QCIF resolution, and at the frame rate of 10 frames per second. 

40 DETAILED DESCRIPTION OF THE INVENTION 

[0020] The methods of the present invention are described herein in terms of the motion accuracy being modified 
at each image block. These methods, however, may be applied when the accuracy is fixed for the whole sequence or 
modified on a frame-by*frame basis. The present invention is also described as using Telenor's video encoders (and 

45 particularty the Telenor encoder) as descn'bed in the Background of the Invention. Although described in temis of Tel> 
enor's video encoders, the techniques described herein are applicable to any other motion-compensated video coder. 
[0021] Most video coders use motion vectors with half pixel (or "1/2-per) accuracy and bilinear interpolation. The 
first version of Telenor's encoder also used 1/2-pel motion vectors and bilinear interpolation. The latest version of Tel- 
enor's encoder, however, incorporated 1/3-pel vectors and cubte-fike interpolation because of the additional compres- 

50 sion gains. Specifically, at a given macroblock, Telenor's estimates the best motion vector in two steps shown in FIG. 2. 
Rrst, the Telenor encoder searches for the best integer-pel vector V^ (FIG. 1)100, Second, the Telenor encoder 
searches for the best 1/3-pixel accurate vector (FIG. 1 ) near V^ 102. This second step is shown graphically in FIG. 
1 where a total of eight blocks (each having an an-ay of 16x16 pixels) in the 3x3 interpolated reference frame are 
checked to find the best match. The motion vectors for these eight blocks are represented by the eight solid dots in the 

55 grid centered on V^. In FIG. 1 the best match is the block associated to the motion vector V = (Vx, Vy)= (1+1/3, 1). 
[0022] The technology of the present invention allows the encoder to choose between any set of motion accuracies 
(for example, 1/2, 1/3, and 1/6-pel accurate motion vectors) using either a full search strategy or a fast search strategy. 
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Full-Search AMA Search Strategy 

[0023] As shown in FIGS. 3 and 4, in the full-search adaptive motion accuracy ("AMA") search strategy the encoder 
searches all the motion vector candidates in a grid of 1/6-pixel resolution and a "square radius" (defined herein as a 

5 square block defined by a number of pixels up, a number of pixels down, and a number of pixels to both sides) of five 
pixels as shown in FIG. 3. FIG. 4 shows that the first step of the full-search AMA is to search for the best integer-pel 
vector Vi (FIG. 1)104. In the second step of the full-search AMA, the encoder searches for the best 1/6-pixel accurate 
vector V^yg (FIG. 3) near V, 106. In other words, the full-search AMA modifies the second step of the Telenor's process 
so that the encoder also searches for motion vector candidates in other sub-pixel locations in the velocity space. The 

10 objective is to find the best motion vector in the grid, i.e., the vector that points to the block (in the interpolated reference 
frame) that best matches the current macroblock. Although the full-search strategy is computationally complex since it 
searches 120 sub-pixel candidates, it shows the full potential of this preferred method of the present invention. 
[0024] A critical issue in the motion vector search is the choice of a measure or criterion for estabrishing whrch block 
is the best match for the given macroblock. In practice, most methods use either the mean squared error ("MSE") or 

15 mean absolute difference ("MAD") criteria. The MSE between two bk)cks consists of subtracting the pixel values of the 
two blocks, squaring the pixel differences, and then taking the average. The MAD difference between two blocks is a 
similar distortion measure, except that the absolute value of the pixel differences is computed instead of the squares. If 
two image blocks are similar to each other, the MSE and MAD values will be small. If, however, the image blocks are 
dissimilar, these values will be large. Hence, typk^al video coders find the best match for a macrobtock by selecting the 

20 motion vector that produces either the smallest MSE or the smallest MAD. In other words, the block associated to the 
best motion vector is the one closest to the given macroblock in an MSE or MAD sense. 

[0025] Unfortunately, the MSE and MAD distortion measures do not take into account the cost in bits of actually 
encoding the vector. For example, a given motion vector may minimize the MSE, but it may be very costly to encode 
with bits, so it may not be the best choice from an coding standpoint 

25 [0026] To deal with this, advanced encoders such as those described by Telenor use rate-distortion ("RD") criteria 
of the type "distortion -i- L*Bits' to select the best motion vector. The value of "distortion" is typically the MSE or MAD, 
"L" is a constant that depends on the compression level (i.e., the quantization step size), and "Bits" is the number of bits 
required to code the motion vector In general, any RD criteria of this type would work with the present invention. How- 
ever, in the present invention "Bits" include the bits needed for encoding the vector and those for encoding the accuracy 

30 of the vector. In fact, some candidates can have several "Bits" values, because they can have several accuracy modes. 
For example, the candidate at location (1/2, -1/2) can be thought of having 1/2 or 1/6 pixel accuracy. 

Fast-Search AMA Search Strategy 

35 [0027] As shown in FIGS. 5 and 6, in the fast-search adaptive motion accuracy ('AMA") search strategy the 
encoder checks only a small set of the motion vector candidates. In the first step of the fast-search AMA, the encoder 
checks the eight motion vector candidates in a grid of 1/2-pixel resolution of square radius 1, whteh is centered on 
1 08. V2 is then set to denote the candidate that has the smallest RD cost (i.e., the best of the eight previous vectors and 
V)) 1 10. Next, the encoder checks the eight motion vector locations in a grid of 1/6-pixei resolution of square radius 1 

40 that is now centered on V2 112. If V2 has the smallest RD cost 1 14, the encoder stops its search and selecte Va as the 
motion vector tor the block. Othenwise, V3 is set to denote the best motion vector of the eight 116. The encoder then 
searches for a new motion vector candidates in the grid of 1/6-pixel resolution of square radius 1 that is centered on V3 
1 1 8. It should be noted that some of the candidates in this grid have already been tested and can be skipped. The can- 
didate with the smallest RD cost in this last step is selected as the motion vector for the block 120. 

45 [0028] Experimental data has shown tiiat, on average, this simple fast search strategy typk:ally checks the RD cost 
of about eighteen locations in sub-pixel space (ten mprie than Telenor's search strategy), and hence the overall compu- 
tational complexity is only moderately increased. 

[0029] The experimental data discussed below in connection with FIGS. 8-1 8 show that there is practically no loss 
In compression performance from using this fast-search version of AMA. This is because the fast-search AMA search 
50 strategy exptoits the convexity of the "distortion + L"Bits" curve (c.f., "distortion" Is known to be convex), by creating a 
path that smartly follows the RO cost from higher to lower levels. 

[0030] /Mternate embodiments of the invention replace one or more of the steps 108-120. These embodiments 
have also been effective and have further reduced the numbler of motion vector candkiates to check in the sub-pixel 
velocity space. 

55 [0031] FIG. 7. for example, checks candidates of 1/3-pel accuracy. In this embodiment step 1 12 is replaced by one 
of three possible scenarios. First, if the best motion vector candidate from step 110 is at the center of (the "integer- 
pel vector") 130, then the encoder checks three candidates of 1/3-pel accuracy between the center vector and the 1/2- 
pel k>cation with the next k>west RD cost 132. Second, if the best motion vector candidate from step 1 10 is a comer 
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vector 134. then, the encoder checks the four vector candidates of 1/3-pel accuracy that are closest to such corner 1 36. 
Third, if the best motion vector candidate froni step 1 1 0 is between two corners 1 38, then, the encoder detemiines 
which of these two corners has lower RD cost and checks the four vector candidates of 1/3-pel accuracy that are closest 
to the line between such corner and the best candidate from step 1 1 0 1 40. It should be noted that in implementing this 
5 process step 1 38 may be unnecessary because if V2 is neither at the center or a corner vector, then it would necessar- 
ily be between two corners. If the encoder is set to find motion vectors with 1/3-pixel accuracy, FIG. 7 could be modified 
to end rather than continuing with step 1 14. 

Computation And Memory Savings 

10 

[0032] Because step 1 08 checks only motion vector candidates of 1/2-pixeI accuracy, the computation and memory 
requirements for the hardware or software implementation are significantly reduced. To be specific, in a smart imple- 
mentation embodiment of this fast-search the reference frame Is interpolated by 2x2 in order to obtain the RD costs for 
the 1/2-pel vector candidates. A significant amount of last (or cache) memory for a hardware or software encoder is 
15 saved as compared to Telenor's approach that needed to interpolate the reference frame by 3x3. In comparison to the 
Telenor encoder, this is a cache memory savings of 9/4 or a factor of 2.25. The few additional interpolations can be done 
later on a block-by-block basis. 

[0033] Additionally, since the interpolations in step 108 are used to direct the search towards the lower values of 
the RO cost function, a complex filter is not needed for these interpolations. Accordingly, computation power may be 

20 saved by using a simple bilinear filter for step 1 08. 

[0034] Also, other key coding decisions such as selecting the mode of a macroblock (e.g., 16x16, four-8x8. etc.) 
can be done using the 1/2-pel vectors because such decisions do not benefit significantly from using higher accuracies. 
Then, the encoder can use a more complex cubk: filter to interpolate the required sub-pixel values for the few additional 
vector candidates to check in the remaining steps. Since the macroblock mode has already been chosen, these final 

25 interpolations only need to be done for the chosen mode. 

[0035] Use of multiple-fitters obtained computation savings of over twenty percent in running time on a Sparc Ultra 
1 0 Workstation in comparison to Telenor's approach, which uses a cubk: interpolation all the time. Additionally, the fast- 
memory requirements were reduced by nearly half. Also, there was little or no loss in compression performance. Com- 
paring one preferred embodiment of the fast-search, Benzler's technique requires about 70 interpolations per pixel in 

30 the Telenor encoder and the present invention requires only about 7 interpolations per pixel. 

Coding The Motion Vector And Accuracies With Bits 

[0036] Once the best motion vector and accuracy are detemiined, the encoder encodes both the motion vector and 
35 accuracy values with bits. One approach is to encode the motion vector with a given accuracy (e.g.. half -pixel accuracy) 
and then add some extra bits for refining the vector to the higher motton accuracy. This Is the strategy suggested by B. 
Girod. but it is sub-optimal in a rate-distortion sense. 

[0037] In one preferred embodiment of the present invention, the accuracy of the motion vector for a macroblock is 
first encoded using a simple code such as the one given in Table 1 . Any other table with code lengths {1 , 2, 2} could be 
40 used as well. The bit rate could be further reduced using a typical DPCM approach. 



Code Motion 
Accuracy 
i 1/2-pel 
"oi 1/3-pel 
n 1/6-pel 
. VLC table to indicate the accuracy 
mode for a given macroblock. 

55 - 

Next, the value of the vector/s in the respective accuracy space is encoded. These bits can be obtained from entries of 
a single VLC table such as the one used in the H26L codec. The key idea is that these bits are interpreted differently 
depending on the motion accuracy for the macroblock. For example. If the motion accuracy is 1/3 and the.code bits for 
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Table 
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the X component of the difference motion vector are 00001 1 , the X component of the vector is Vx= 2/3. If the accuracy 
is 1/2. such code con-esponds to Vx= 1. 

[0038] Compared to the Benzler method for encoding the motion vectors with a variable length code ("VLC") table 
that could be used for encoding 1/2 and 1/4 pixel accurate vectors, the method of the present invention can be used for 

5 encoding vectors of any motion accuracy and the table can be interpreted differently at each frame and macroblock. 
Further, the general method of the present invention can be used for any motion accuracy, not necessarily those that 
are multiples of each other or those that are of the type 1/n (with n an integer). The number of increments in the given 
sub-pixel space is simply counted and the bits In the assodated entry of the table is used as the code. 
[0039] From the decoder's viewpoint, once the motion accuracy is decoded, the motion vector can also be easily 

10 decoded. After that, the associated block in the previous frame Is reconstructed using a typical 4-tap cubic interpolator. 
There is a different 4-tap filter for each motion accuracy. 

[0040] The AMA does not increase decoding complexity, because the number of operations needed to reconstruct 
the predicted block are the same, regardless of the motion accuracy. 

75 Experimental Results 

[0041] FIGS. 8-18 show test results of the Telenor encoder codec with and without AMA in a variety of video 
sequences, resolutions, and frame rates, as described in Table 2. These figures show rate-distortk)n ("RD") plots for 
each case. The "Anchor* curve shows RD points from optimized 1-1.263+ (FIGS. 8 and 9 only). The "Telenor 1/2+b" 

20 curve shows Telenor with 1/2-pel vectors and bilinear interpolation (the "classical case"). The "Telenor 1/3" curve shows 
the current Telenor proposal (the "Telenor encoder"). The "Telenor+AMA+c" curve shows the Telenor encoder with the 
full-search strategy of the present invention. The" Telenor +FSAMA+C', as shown in FIGS. 15-17. shows the current Tel- 
enor encoder with the fast-search strategy. (Unless otherwise specified, the full-search version of AMA was the encoder 
strategy used in the experiments.) All of the test results were cross-checked at the encoder and decoder. These results 

25 show that w'rth AMA the gains In peak signaHo-notse ratio ("PSNR") can be as high as 1 dB over H26L. and even higher 
over the classical case. 
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[0042] The video sequences are commonly used by the video coding community, except for "Paris Shaked." The 
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latter is a synthetic sequence obtained by shifting the well-known sequence "Paris" by a motion vector whose X and Y 
components lake a random value within [-1 ,1]. This synthetic sequence simulates small movements caused by a hand- 
held camera in a typical video phone scene. 

5 Comparison Of Full-Search And Fast-Search AMA 

[0043] The experimental results shown in FIGS. 16 and 17 demonstrate that the encoder performance with fast- 
search ("Telenor FSAMA-fc") and full-search (Telenor AMA^c'} strategies for AMA is practically the same. This is true 
because the fast-search strategies exploit the convexity of the RD cost curve in the sub-pixel velocity space. In other 
fo words, since the shape of the RD cost follows a smooth convex curve, its minimum should be easy to find with some 
smart fast-search schemes that descend down the curve. 

Combining AMA And Multiple Reference Frames 

IS [0044] In the plot shown in FIG. 1 6, the curves labeled *1 r" used only one reference frame for the motion compen- 
sation, so these curves are the same as those presented in FIG. 1 0. The curves labeled "5r" used five reference frames. 
[0045] The experiments show that the gains with AMA add to those obtained using multiple reference frames. The 
gain from AMA in the one-reference case can be measured by comparing the green and pink curves, and the gain in 
the five-reference case can be measured between the blue and red curves. 

20 [0046] It should be noted that the present invention may be Implemented at the frame level so that different frames 
could use different motion accuracies, but within a frame all motion vectors would use the same accuracy. Preferably in 
this embodiment the motion vector accuracy would then be signaled only once at the frame layer. Experiments have 
shown that using the best, fixed motion accuracy for the whole frame shouM also prockjce compression gains as those 
presented here for the macroblock-adaptive case. 

2S [0047] In another frame-based embodiment the encoder could do nrK>tlon compensation on the entire frame with 
the different vector accuracies and then select the best accuracy according to the RD criteria. This approach is not suit- 
able for pipeline, one-pass encoders, but it could be appropriate for software-based or more complex encoders. Still 
another fame-based embodiment the encoder could use previous statistics and/or formulas to predict what will be the 
best accuracy for a given frame (e.g., the formulas in set forth in the Ribas work or a variation thereof can be used). This 

30 approach would be well-suited for one-pass encoders, although the performance gains would depend on the precision 
of the formulas used for the prediction. 

[0048] The terms and expressions which have been employed in the foregoing specification are used therein as 
terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of exclud- 
ing equh^lents of the features shown and described or portions thereof, it being recognized that the scope of the inven- 
ts tlon is defined and limited only by the claims that follow. 

Claims 

1 . A fost-search adaptive motion accuracy search method for estimating motion vectors In motion-compensated video 
40 coding by finding a best motion vector for a macrobk)ck, said method comprising the steps of: 

(a) searching a first set of motion vector candidates in a grid of sub-pixel resolution of a predetemriined square 
radius centered on to find a best motion vector V2; 

(b) searching a second set of motion vector candidates in a grid of sub-pixel resolution of a predetermined 
4S square radius centered on V2 to find a best motion vector V3; and 

(c) searching a third set of motion vector candidates in a grid of sub-pixel resolution of a predetermined square 
radius centered on V3 to find said best motion vector of said macroblock. 

2« The method of claim 1 , said step of searching a first set of motion vector candidates in a grid of sub-pixel resolution 
so of a predetermined square radius centered on to find a best motion vector V2 further comprising the step of 
searching a first set of eight motion vector candidates in a grM of ^/2-p\xe\ resolution of square radius 1 centered 
on to find a best motion vector V2. 

3. The method of claim 1 , said step of searching a second set of motion vector candidates in a grid of sub-pixel res- 
55 olution of a predetermined square radius centered on V2 to find a best motion vector V3 further comprising the step 
of searching a second set of eight motion vector candidates in a grid of 1/6^1xel resolution of square radius 1 cen- 
tered on V2 to find a best motion vector V3. 
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4. The method of claim 1 further comprising the steps of using V2 as the motion vector for the block if V2 has the small- 
est rate-distortion cost and skipping step (c) of claim 1. 

5. The method of claim 1 , said step of searching a third set of motion vector candidates in a grid of sub-pixel resolution 
5 of a predetermined square radius centered on V3 to find said best motion vector of said macroblock further com- 
prising the step of searching a third set of eight motion vector candidates in a grki of 1/6-pixel resolution of square 
radius 1 centered on V3 to find said best motion vector of said macroblock, 

6. The method of daim 1 , said step of searching a third set of motion vector candidates in a grid of sub-pixel resolution 
10 of a predetermined square radius centered on V3 to find said best motion vector of said macroblock further com- 
prising the step of skipping motion vector candidates of said third set of motion vector candidates that have already 
been tested. 

7. The method of dann 1 further wherein said step of searching said first set of motion vector candidates further com- 
15 prises the step of searching said first set of motion vector candidates using a first filter to do a first interpolation. 

said step of searching said second set of motion vector candidates further comprises the step of searching said 
second set of motion vector candidates using a second filter to do a second interpolation, and said step of search- 
ing said third set of motion vector candkJates further comprises the step of searching said third set of motion vector 
candidates using a third filter to do a third interpolation. 

20 

8. The method of claim 1 . said step of searching a second set of motion vector candidates in a grid of sub-pixel res- 
olution of a predetermined square radius centered on V2 to find a best motion vector V3 further comprising the 
steps of: 

25 (a) searching three candidates of 1/3-pel accuracy V2 and a 1/2-pel location with the next lowest RD cost If V2 

is at the center; 

(b) searching four vector candidates of 1/3-pel accuracy that are closest to Vg if V2 is a comer vector; and 

(c) detemiining which of two comers has lower RD cost and searching four vector candidates of 1/3-pel accu- 
racy that are closest to a line between said corner with lower RO cost, if V2 is between two comera vectors. 

30 

9. An adaptive motion accuracy search method for estimating motion vectors in motion-compensated vMeo coding by 
finding a best motion vector for a macroblock, said method comprising the steps of: 

(a) searching a first set of motion vector candidates in a grid centered on to find a best motion vector V2 
35 using a first filter to do a first interpolation; 

(b) searching a second set of motion vector candidates In a grki centered on V2 to find a best motion vector V3 
using a second filter to do a second interpolation; and 

(c) searching a third set of motion vector candidates in a grid centered on V3 to find said best nriotion vector of 
said macroblock using a third filter to do a third interpolation. 

40 

10. The method of claim 9 wherein said step of searching using a first filter to do a first interpolation further comprises 
using a simple filter to do a coarse interpolation. 

11. The method of claim 9 wherein said step of searching using a first filter to do a first interpolation further comprises 
45 using a simple filter to do a coarse interpolation and said step of searching using a second filter to do a second 

interpolation further comprises using a complex filter to do a fine Interpolation. 

12. The method of daim 1 1 wherein said step of searching using a third filter to do a third interpolatk}n further com- 
prises using a complex fitter to do a fine Interpolation. 

50 

13. The method of claim 9 wherein said step of searching using a first filter to do a first interpolation further comprises 
using a bilinear filter to interpolate the reference frame by 2x2. 

1 4. The method of claim 9 wherein said step of searching using a first filter to do a first interpolation further comprises 
55 using a bilinear filter to interpolate the reference frame by 2x2 and said step of searching using a second filter to do 

a second interpolation furtfier comprises using a cubic fitter to do a fine interpolation. 

15. The method of daim 14 wherein said step of searching using a third filter to do a third interpolation further com- 
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prises using a cubic fitter to do a fine interpolation. 

16. An adaptive motion accuracy search method for estimating motion vectors in motion-compensated video coding by 
finding a best motion vector for a macroblock, said method comprising the steps of: 

5 

(a) searching at a first motion accuracy for a first best motion vector of said macroblock; 

(b) encoding said first best motion vector and said first motion accuracy; 

(c) searching for at least one second best motion vector of said macroblock at an at least one second motion 
accuracy; 

10 (d) encoding said at least one second best motion vector and said at least one second motion accuracy; and 

(e) selecting the best motion vector of said first and at least one best motion vectors using rate>distortion crite- 
ria. 

17. The method of claim 16 wherein said step of selecting the best motion vector using rate-distortion criteria further 
IS comprises the step of said rate-distortion criteria adapting according to the different motion accuracies to deter- 
mine both the best motion vectors and the best motion accuracies. 

18. The method 6( claim 1 6, saki step of searching for at least one second best motion vector at an at least one second 
motion accuracy further comprising the step of searching for at least one second best motion vector of said mac- 

20 robk>ck at an at least one second motion accuracy that is finer than said first motion accuracy. 

19. The method of claim 16 wherein said step of selecting the best motion vector using rate-distortion criteria further 
comprises the step of using rate-distortion criteria of the type "distortion -t- L*Bits" to select the best motion vector. 

25 20. An adaptive motion curacy search method for estimating motion vectors in motion-compensated video coding by 
finding a best motion vector for a macroblock. said method comprising the steps of: 

(a) searching at a motion accuracy for a best motion vector of said macroblock; 

(b) encoding said motion accuracy using a code from a VLC table that is interpreted differently at different cod- 
30 ing units according to the associated motion vector accuracy; and 

(c) encoding said best motion vector In the respective accuracy space. 

21. A system for estimating motion vectors in motion-compensated video coding by finding a best motion vector for a 
macrot}lock, said system comprising: 

35 

(a) a first encoder for searching a first set of motion vector candidates in a grid of sub-pixel resolution of a pre- 
determined square radius centered on to find a best motion vector V2; 

(b) a second encoder for searching a second set of motion vector candidates in a grid of sub-pixel resolution 
of a predetermined square radius centered on V2 to find a best motion vector V3; and 

40 (c) a third encoder for searching a third set of motk>n vector candidates in a grid of sub-pixel resolution of a pre- 

determined square radius centered on V3 to find said best motion vector of said macroblock. 

22. The system of claim 21 wherein said first, second, and third encoders are a single encoder. 
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FIG.2 



The encoder searches 
for ttie best Integer-pel vector V1 . 
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The encoder searches for 
ttie best 1/3-pixel accurate 
vector VI /3 near V1 . 
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FIG.3 
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FIG.4 



The encoder searches 
for the best integer-pel vector VI. 
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The encoder searches for 
the best 1/6-p(xei accurate 
vector VI /6 near VI . 
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FIG.6 



Checking the eight motion vector candidates in a grid of 
1/2-^ixel resolution of radius 1 whidi is centered on VI. 
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Denoting the candidate that has the smallest RD cost as V2. 
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Checking the eight motion vector locations in a grid of 
1/6-pixel resolutton of radius 1 that is new centered on V2. 
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Denoting the best motion vector of the eight as V3. 
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Checking the new motton vector candidates in 
the grid of 1/6-pixel resolutton of radius 1 that is 
centered on V3. Observe that some of the candidates in 
the grid have already been tested and can be skipped. 
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Selecting the candidate with the smallest 
RD cost as the motion vector for the t>lock. 
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FIG.7 
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Determining which of the two comers 

has lower RO cost and checking 
the four vector candidates of 1/3*pel 
accuracy that are closest to the line 
between such comer and V2. 
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